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Abstract 

This  thesis  investigates  applications  of  multi- reference  frame  image  registration 
for  image  sets  with  various  translation,  rotation,  and  scale  combinations.  It  focuses 
on  registration  accuracy  improvement  over  traditional  pairwise  registration,  and  also 
compares  the  quality  of  scene  estimation  from  frame  averaging.  Three  experiments  are 
developed  which  use  cross-correlation  to  estimate  translation,  the  Radon  transform 
to  estimate  translation  and  rotation,  and  the  Fourier-Mellin  transform  to  estimate 
translation,  rotation,  and  scale.  Results  from  applying  multi-reference  frame  registra¬ 
tion  in  these  experiments  show  distinct  improvements  in  both  registration  accuracy 
and  quality  of  frame  averaging  compared  to  single-reference  frame  registration.  Fur¬ 
thermore,  it  is  shown  that  the  new  registration  technique  is  equivalent  to  the  optimal 
Gauss-Markov  estimator  of  the  relative  shifts  given  all  pairwise  shifts. 
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Multi-Reference  Frame  Image  Registration 


for  Rotation,  Translation,  and  Scale 

I.  Introduction 

Multiple  images  of  a  target  or  scene  captured  from  a  single  sensor,  are  generally 
distorted  from  one  image  to  the  next.  The  severity  and  type  of  distortion 
directly  depend  on  the  sensor,  target,  and  information  acquired.  The  term  “image” 
commonly  refers  to  capturing  information  of  a  specific  target  or  scene  using  the  visible 
range  of  the  electromagnetic  spectrum  or  light.  However,  other  forms  of  information 
could  be  gathered  and  simply  converted  to  a  visual  representation.  Virtually  anything 
with  varying  magnitude  can  be  mapped  to  a  range  of  colors,  such  as  infrared  radiation 
for  thermal  imaging  or  sound  waves  for  ultrasound  imaging. 

The  term  “image”  should  be  more  loosely  defined  as  any  visual  representation 
of  information.  The  information  gathered  could  also  have  several  dimensions  of  visual 
significance,  as  in  hyperspectral  imaging.  Hyperspectral  imaging  is  a  fairly  new  field 
of  imaging  science,  but  as  the  name  suggests,  it  contains  many  dimensions  of  infor¬ 
mation,  and  there  could  be  several  hundred  dimensions  of  data.  The  distortions  for 
a  hyperspectral  image  could  be  completely  different  than  the  distortions  in  an  image 
of  visible  light. 

To  maintain  generality,  this  thesis  refers  to  an  image  as  any  two-dimensional  (2- 
D)  visual  representation  of  information  taken  from  a  single  sensor  and  analyzes  only 
linear  geometric  distortions,  specifically  translation,  rotation,  and  scale  and  combi¬ 
nations  thereof.  These  distortions  mostly  stem  from  a  simple  principle:  unless  the 
sensor  and  target  are  completely  stationary,  the  act  of  repeatedly  capturing  images 
introduces  relative  random  distortion. 
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1.1  Geometric  Image  Distortion 

For  images  captured  at  a  relatively  fast  rate,  the  distortions  are  most  likely 
caused  by  spatial  changes.  Even  if  the  images  are  captured  quickly,  there  could  still 
be  temporal  changes  that  cause  relative  distortion  between  images. 

Relative  translative  distortion  is  present  in  any  set  of  images  that  are  captured 
from  non-stationary  sensors  or  targets.  A  familiar  example  involves  the  simple  act  of 
a  person  photographing  multiple  pictures  of  the  same  object.  Random  shaking  of  the 
hand  causes  small  differences  in  where  the  camera  is  pointing.  The  images  collected 
from  the  photographs  also  have  rotational  distortions  because  as  the  photographer 
pushes  the  button  to  take  a  picture  the  camera  tilts  and  twists  slightly,  introducing 
random  relative  rotation.  Any  non-stationary  sensor  could  introduce  translative  or 
rotational  distortion  when  capturing  multiple  images. 

For  images  with  significant  differences  in  scaling,  the  sensor  conlcl  be  moving 
relative  to  the  target  or  vice  versa.  For  example,  an  airborne  sensor  may  introduce 
relative  scaling.  If  a  sensor  is  mounted  on  a  airborne  platform  that  is  flying  towards  a 
target,  each  successive  capture  of  the  target  is  at  time  when  the  platform  is  actually 
closer  to  the  target.  This  discrepancy  in  distance  from  one  capture  to  next  results 
in  relative  scaling.  The  airborne  sensor  undoubtedly  has  translational  and  rotational 
distortion  to  some  degree.  It  is  difficult  to  completely  stabilize  a  sensor,  especially  if 
it  is  mounted  to  a  mobile  platform. 

Distorted  images  are  not  only  caused  by  sensors  or  even  targets;  distortion  may 
be  independent  of  the  sensor.  The  image  may  also  be  distorted  by  some  form  of  post 
processing  or  intentional  manipulation,  such  as  encryption  or  digital  watermarking  [1]. 
The  foregoing  is  not  an  exhaustive  account  of  the  causes  of  distortion:  the  focus  here 
is  not  the  cause  of  geometric  distortion  but  rather  correcting  the  effects  of  distortion. 

1.1.1  Mathematical  Models.  Each  distortion  or  combinations  of  distortions 
can  be  modeled  mathematically.  Let  f(x,y)  be  a  2-D  image.  A  2-D  image  fT(x,y) 
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(a)  f(x,y)  (b)  fT(x,y) 

Figure  1.1:  Convention  for  the  model  of  an  image  translated  by  [tx,  tv\. 

translated  by  the  vector  [rx,  ry]  is 

fT(x,y)  =  f(x  -rx,y  -Ty),  (1.1) 

where  tx  is  the  horizontal  translation  and  ry  is  the  vertical  translation,  as  in  Figure  1.1. 
We  assume  that  an  image  can  only  be  translated  half  as  much  as  its  width  in  any 
dimension  otherwise,  the  translated  image  would  have  more  new  information  than 
information  that  is  in  common  with  the  original  image  f(x,y).  A  2-D  image  f,j,{x,y) 
rotated  counter-clockwise  about  its  origin  by  an  angle  0  is 

/^(x,  y)  =  f  {x  cos  0  +  y  sin  0,  y  cos  0  —  x  sin  0) ,  (1.2) 

where  0  G  [0,27t],  as  in  Figure  1.2.  Finally,  an  image  fp(x,y)  scaled  by  a  positive 
factor  of  0  is 

fp(x,y)  =  f(Px,  0y).  (1.3) 

If  0  >  1  then  fp(x,y)  is  smaller  than  f(x,y).  Conversely,  if  0  <  1  then  fp(x,y)  is 
larger  than  f(x,y). 
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(a)  f(x,y) 


Figure  1.2:  Convention  for  the  model  of  an  image  rotated  by  (j). 


Mathematical  models  of  combinations  of  distortions  are  more  complicated  be¬ 
cause  the  order  of  distortions  can  change  the  model,  e.g.,  there  is  a  difference  in  the 
model  for  an  image  that  is  first  translated  and  then  rotated  versus  an  image  that  is 
first  rotated  and  then  translated.  An  image  translated  by  the  vector  [tx ,  ry]  and  then 
rotated  by  (j)  is 


fr,<t>(x,y)  =  f((x-rx)  cos0+  (y  -  Ty)  sin0, -{x  -  rE)sin0+  (y  -  ry)  cos 0),  (1.4) 

where  as  an  image  rotated  by  0  and  then  translation  by  the  vector  [rx,  ry\  is 

f<j>,r(x,y)  =  f  {x  cos  cf>  -\-  y  sin  0  rx,  -xsin0  +  ycoscp  -  ry).  (1.5) 

These  two  models  result  in  different  images  when  the  same  [tx,  ry]  and  (j)  are  used,  as 
in  Figure  1.3.  The  order  of  distortions  is  denoted  by  the  order  of  the  indices  of  /.  For 
example,  an  image  translated  and  then  rotated  is  denoted  fTt^(x,y),  and  an  image 
rotated  and  then  translated  image  is  denoted  f<]>,T(x,y). 
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Figure  1.3:  Illustration  showing  the  difference  between  models  for  an  image  that 

is  translated  and  then  rotated,  fT^(x,y),  versus  an  image  that  is  rotated  and  then 
translated,  f<j>,T(x,y).  Each  model  has  the  same  values  for  [rx , ry\  and  0  and  are 
overlayed  to  highlight  the  differences. 
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There  are  six  combinations  of  translation,  rotation,  and  scale.  The  models  for 


all  six  combinations  are 

fn,rAx,y)  =  /((/3x-rx)cos0+  (/%-Ty)sin0,  -(flx  -  rx)  sinfl  +  (fly  -  ry)  cos  fl) 

(1.6) 

f/3,ct>Ax->y)  =  f(flxcosfl  +  fly  sin  fl  —  rx , -flxsinfl  +  fly  cos  fl  -  ry)  (1.7) 

U,pAxiy)  =  f  (flix  -  Tx) cos  (fi  +  fl(y  -  Ty)  sin0>  ~P(X  -  Tx)  sinfl  +  fl{y-  ry)  cosfl) 

(1.8) 

fr,<t>Axi  y)  =  f(P[(x  -  Tx)  cos  fl  +  {y-  Ty)  sin  <t>],  fl[-(x  -  Tx)  sin  fl  +  (y  -  rv)  cos  </>]) 

(1.9) 

U,0,r(x,y)  =  f  (fl[x  cos  (f)  +  y  sin (j)\  -  tx,  fl[-x  sinfl  +  y  cos  fl\  -  r^)  (1.10) 

U,r,p(x,y)  =  f  (fl[x  cos  fl  +  y  sin (f>-  rx],  fl[~x  sinfl  +  y  cos  fl  -  ry])  (1.11) 

Note  that  fT,pAx->y )  =  fr,<j),0{x,y).  Aside  from  the  obvious  mathematical  differences 
the  models  are  essentially  the  same  they  each  model  an  image  that  has  been  translated, 
rotated,  and  scaled. 

1.2  Image  Registration  and  Applications 

It  is  useful  to  compare  a  data  set  of  multiple  images  of  the  same  target  in  order 
to  gather  more  information  than  a  single  image  can  provide.  The  relative  distor¬ 
tions  between  images  could  inhibit  proper  comparison  or  processing  of  the  images 
and,  in  turn,  adversely  affect  analysis  and  conclusions.  Image  registration  determines 
parameters  of  the  geometric  distortion(s)  relating  two  images  and  then  removes  them. 

Frame  averaging  is  a  common  technique  that  uses  registration  to  average  a  set 
of  commonly  aligned  images;  it  is  a  primary  focus  of  this  thesis.  If  the  distortions 
are  not  removed,  then  the  average  of  the  image  could  produce  a  worse  composite 
image.  The  goal  of  frame  averaging  is  to  generate  an  image  with  more  detail.  Image 
registration  can  also  be  used  to  stabilize  video.  It  is  difficult  to  extract  valuable 
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information  from  video  that  is  unstable  or  shaky,  and  human  perception  is  sensitive 
to  such  instabilities.  Image  matching  is  another  technique  that  uses  image  registration 
to  determine  the  presence  of  a  known  object  or  to  determine  if  a  set  of  images  contain 
the  same  object.  If  the  object  in  a  set  of  images  is  distorted,  it  could  be  difficult  to 
determine  if  each  image  contains  the  same  object.  Image  matching  is  commonly  used 
in  digital  watermarking  to  determine  if  a  pair  of  images  contain  a  known  watermark, 
but  often  the  watermark  is  distorted  and,  therefore,  image  registration  is  needed  to 
properly  match  the  images. 

Traditionally,  image  registration  is  performed  using  a  single  reference  image  to 
which  all  other  images  are  compared.  This  procedure  results  in  registration  estimates 
that  are  inherently  biased  to  the  reference  image.  Also,  there  is  information  in  the 
data  that  is  not  used.  A  registration  technique  is  needed  that  is  unbiased  and  utilizes 
all  available  information.  Multi-reference  frame  image  registration  has  exactly  this 
function  [2]. 

1.3  Research  Objective 

The  objective  of  this  thesis  is  to  further  investigate  the  work  of  Bruckart  [2] 
in  the  area  of  multi-reference  frame  image  registration  by  expanding  its  applications 
to  include  not  only  translation  but  also  rotation  and  scale.  This  research  focuses  on 
multi-reference  frame  registration  accuracy  improvements  over  traditional  registra¬ 
tion  and  its  implications,  while  also  comparing  the  quality  of  scene  estimation  from 
frame  averaging.  The  results  will  demonstrate  the  advantages  of  multi-reference  frame 
registration  and  its  possible  applications.  To  accomplish  this  objective,  several  exper¬ 
iments  are  developed  that  use  combinations  of  geometric  distortions  and  that  apply 
different  registration  algorithms  to  estimate  distortions.  The  aim  of  this  research  is 
to  compare  the  performance  of  the  multi-reference  and  single  reference  frame  regis¬ 
trations  methods  and  the  quality  of  their  respective  frame  averaging  processes. 
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1.4  Resources 

All  coding  of  registration  algorithms  and  post-processing  of  results  are  per¬ 
formed  using  Matlab®  version  7.4.0287  (R2007a)  on  a  dual  processor  Intel  Zeon  3.6 
Ghz  with  3  GB  of  memory  using  Windows  XP  Service  Pack  2.  All  simulations  are 
performed  using  Matlab®  version  7.1.0.183  (R14)  Service  Pack  3  on  a  Linux  High 
Performance  Computing  (HPC)  cluster,  which  has  64  nodes  (64  bit)  and  128  Opteron 
248  2.2  GHz  CPUs  with  4  GB  of  memory  per  CPU. 


II.  Review  of  Registration  Techniques 


A  majority  of  the  literature  on  image  registration  is  based  on  a  single  reference 
frame  approach  where  one  frame,  the  reference  image,  is  compared  or  matched 
to  another  frame,  the  secondary  image.  Image  registration  for  a  set  of  frames  is 
performed  by  repeating  this  process  for  different  secondary  images,  using  the  same 
reference  image,  hence  the  name  “single  reference  frame  method”.  When  only  one 
reference  image  is  used,  registration  estimates  are  biased  to  the  reference  image.  If 
this  image  is  heavily  corrupted  by  noise  or  other  interference,  the  results  of  the  regis¬ 
tration  are  inaccurate.  However,  under  most  circumstances  the  single  reference  frame 
approach  works  well  and  is  computationally  less  taxing  than  a  multi-reference  frame 
approach. 

2.1  Single  Reference  Frame  Image  Registration 

One  common  and  effective  single  reference  frame  technique  for  image  registra¬ 
tion  is  based  on  the  2-D  cross-correlation  [3].  The  cross-correlation  technique  requires 
complicated  calculations  when  the  geometric  transforms  between  the  two  images  in¬ 
clude  rotation  and  scaling.  Another  technique  uses  moments  and  moment  invariants 
for  image  matching.  Moments  are  sensitive  to  noise,  moment  invariants  have  limited 
discriminating  power,  and  high-order  moments  require  extensive  computations  [4], 
Image  matching  using  symmetric  phase-only  matched  filters  (SPOMF)  have  been 
shown  to  be  computationally  efficient,  to  be  robust  against  noise,  and  to  have  high 
discriminating  power  (sharpness  in  correlation  peak),  but  they  are  sensitive  to  varia¬ 
tion  in  rotation  and  scale  [5]  [6]. 

Ideally,  techniques  are  needed  that  are  invariant  to  some  of  the  geometric  trans¬ 
forms  relating  two  images,  thus  reducing  complexity  and  dimensionality.  An  invariant 
technique,  if  applied  to  a  distorted  image,  results  in  an  output  that  is  equivalent  to 
the  undistorted  input  image,  assuming  that  the  technique  is  invariant  for  the  spe¬ 
cific  distortions  of  the  image.  One  such  technique  is  the  Fourier-Mellin  transform 
(FMT)  [1]  [7].  The  FMT  is  invariant  to  translation,  rotation,  and  scale,  making  it 
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an  ideal  image  matching  technique.  Another  such  technique  is  the  Radon  Transform 
(RT)  [8]  [9]  [10]  [11].  This  thesis  focuses  on  registration  using  cross-correlation,  FMT, 
and  RT. 


2.1.1  Cross-Correlation.  Cross-correlation  is  commonly  used  in  many  signal 
processing  applications  such  as  image  registration  and  pattern  recognition.  It  mea¬ 
sures  the  similarity  in  shape  of  two  signals.  The  un-normalized  cross-correlation  of 
two  discrete  vectors  g(n )  and  h(n)  both  of  size  N  is 


c(m){g,h} 


f  m 

g(n)h*(n  +  (N  —  m)),  1  <  m  <  N 

n= 1 
27V— m 

g{n  +  (m  —  N))h*(n ),  N  +  1  <  m  <  2N  —  1. 


(2.1) 


If  one  vector  is  shorter  than  the  other,  then  the  shortest  is  zero  padded  to  the  length 
of  the  other  vector.  Cross-correlation  computes  a  sum  of  the  product  of  the  two 
vectors  for  all  possible  combinations  of  overlap.  The  location  of  the  maximum  of  the 
cross-correlation  can  be  used  to  estimate  the  translation  between  two  vectors. 

If  the  cross-correlation  is  used  for  estimating  translation  between  two  discrete 
vectors  and  a  reasonable  estimation  for  the  maximum  possible  translation  in  either 
direction,  fmax.  is  known,  then  the  cross-correlation  can  be  modified  to  possibly  in¬ 
crease  accuracy.  Using  one  vector  as  a  reference  vector,  the  non-reference  vector  is 
windowed  by  removing  samples  from  the  ends,  where  the  amount  removed  depends 
on  the  estimated  maximum  possible  translation.  For  example,  if  the  two  vectors  are 
128  samples  and  the  estimated  maximum  shift  is  fmax  =  10  samples,  then  10  samples 
are  removed  from  each  end  of  the  non-reference  vector,  resulting  in  a  reference  vector 
with  128  samples  and  another  vector  of  108  samples. 

Here  equation  (2.1)  does  not  apply  because  the  shorter  vector  should  not  be 
zero  padded  and  partial  overlaps  should  not  be  calculated.  Not  calculating  partial 
overlap  has  the  added  benefit  of  reducing  the  number  of  computations  required  to 
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perform  the  correlation.  The  windowed  cross-correlation  of  two  discrete  vectors  g(ri) 
and  h(p)  of  size  N  and  P,  respectively,  is 

N 

cWin(m){g,  h}  =  y 'g(n)h*(n  +  m  -  1),  1  <m<  2fmax  +  1,  (2.2) 

71=1 

where  N  =  P  —  2fmax  and  N  >  4 fmax.  The  computations  of  the  windowed  cross¬ 
correlation  is  reduced  to  2fmax  +  1  compared  to  the  2N  —  1  computations  of  equa¬ 
tion  (2.1). 

To  possibly  even  further  improve  the  performance  of  cross-correlation  for  esti¬ 
mating  translation  between  vectors,  the  vectors  can  be  normalized.  There  are  many 
ways  to  normalize  vectors,  but  the  focus  focus  here  is  on  normalizing  by  the  power  of 
each  vector.  The  vector  g  normalized  by  its  power  is 


9  = 


9 

Via,  a)' 


(2.3) 


where  (g,g)  is  the  inner  product  of  g  with  itself.  Applying  normalization  to  equa¬ 
tion  (2.2),  results  in  the  normalized,  windowed  cross-correlation 


N 

c”(m){fisM  =  ^2g(n)h*(n  +  m-  i),  1  <  rn  <  2rmax  +  1.  (2.4) 

n= 1 


The  term  normalized,  windowed  cross-correlation  is  misleading  because  it  is  not  the 
cross-correlation  that  is  normalized  but  rather  the  vectors  being  correlated. 

Cross- covariance  is  similar  to  cross-correlation  except  that  cross-covariance  uses 
mean-removed  vectors.  The  un-normalized  cross-covariance  of  two  discrete  vectors 
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g{n )  and  h(n )  both  of  size  IV  is 


n(m){3,/z} 


r  m 

y^(g(n)  -  Mg)(h*(n  +  (N  —  m))  -  gh)  1  <  m  <  N 

n=  1 
27V— ra 

y,  (#(n  +  (m  —  iV))  -  ng){h*{n)  —  gh)  N  +  1  <  m  <  2N  -  1, 

k  71=1 

(2.5) 


where  /ig  and  /i/,  are  the  means  of  p(n)  and  h(n),  respectively.  The  terms  cross¬ 
correlation  and  cross-covariance  are  often  used  interchangeably  despite  their  distinct 
differences.  The  same  normalization  and  windowing  used  for  cross-correlation  can  be 
applied  to  the  cross- covariance  to  possibly  achieve  better  performance. 


Until  now,  all  cross-correlations  discussed  here  have  used  vectors,  but  cross¬ 
correlation  can  also  apply  to  2-D  functions,  such  as  images.  Equations  (2.1),  (2.2), 
and  (2.4)  can  be  expanded  to  the  2-D  case,  but  would  require  drastically  more  compu¬ 
tations.  To  resolve  this  computational  complexity  issue,  the  spectral  implementation 
of  the  cross-correlation  can  be  used.  Let  g  and  h  be  digital  images  and  let  GF  and 
H h  be  their  discrete  Fourier  transforms  (DFT),  respectively.  The  spectral  implemen¬ 
tation  of  the  cross-correlation  is  the  inverse  DFT  of  the  cross-power  spectrum  of  g 
and  h: 

C{g,h}  =  r1{GF(HF)*},  (2.6) 


where  is  the  inverse  DFT  and  GF(HF)*  is  the  cross-power  spectrum  of  g  and  h. 
The  DFT  and  inverse  DFT  are  fast  implementations  of  the  Fourier  transform  (FT) 
and,  therefore,  are  an  ideal  choice  for  2-D  cross-correlation. 


2.1.2  Phase  Correlation.  One  special  case  of  the  spectral  implementation 
of  the  cross-correlation  is  called  phase  correlation  [3].  As  its  name  implies,  phase 
correlation  measures  the  correlation  between  two  images  using  the  phase  of  the  FT. 
To  extract  the  phase  from  g  and  h,  the  cross-power  spectrum  is  normalized  by  its 
magnitude 

m  GF(HFr 

’  1  I  GF(HFy 
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Phase  correlation  is  the  inverse  FT  of  the  normalized  cross-power  spectrum 


/'{'/./-I  -r'w, 


(2.8) 


which  is  equivalent  to  the  normalized  spectral  implementation  of  the  cross-correlation 


(2.9) 


Relative  translative  movement  between  g  and  h  can  also  be  estimated  using  the  loca¬ 
tion  of  the  maximum  of  the  phase  correlation. 

2.1.3  Radon  Transform.  The  RT  recently  received  attention  for  use  in  a 
wide  range  of  applications,  one  of  which  is  image  registration.  There  are  several 


definitions  of  the  RT,  all  of  which  are  related.  The  Radon  transform  HR(r,  6)  for  a  2- 
D  continuous  function  h(x,  y )  is  found  by  computing  line  integrals  along  h,  where  the 


lines  are  defined  by  their  perpendicular  distance  from  the  origin,  r,  and  the  angle  that 
r  makes  with  the  horizontal  axis,  9,  as  in  Figure  2.1.  The  RT  of  a  two-dimensional 
function  h(x,  y )  is 


&{h{x  ,y)}  =11  h(x,y)S{ 


r  —  x  cos  6  +  ysmO)dxdy  =  HB(r,  6).  (2.10) 


For  discrete  applications  such  as  digital  image  registration,  the  line  integrals  in  equa¬ 
tion  (2.10)  are  replaced  by  projections,  and  interpolation  is  required  because  the 
projections  do  not  always  pass  through  the  centers  of  pixels.  There  are  many  al¬ 
gorithms  to  compute  the  discrete  RT.  Here,  the  discrete  RT  is  computed  using  the 
MATLAB®  command  radon,  as  in  Figure  2.2.  An  example  of  the  discrete  RT  is  shown 
in  Figure  2.3. 

The  power  of  the  RT  for  image  registration  comes  from  its  unique  properties 
with  respect  to  rotation,  translation,  and  scale.  The  2-D  image  /^(x,r/),  as  shown  in 
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Figure  2.1:  Illustration  of  the  continuous  RT.  The  line  integrals  across  h,  shown 

as  dotted  lines,  are  defined  by  their  distance  along  r  and  the  angle  r  makes  with  the 
x-axis,  9.  The  origin  for  the  RT  is  the  same  as  the  origin  of  the  the  function  h(x,y). 
The  result  of  the  RT  for  6  =  45°  is  also  shown.  This  figure  is  adapted  from  [8]. 
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Figure  2.2:  The  discrete  RT  of  an  image  is  the  sum  of  RTs  of  each  individual  pixel. 
The  radon  command  divides  each  pixel  in  the  image  into  four  subpixels,  each  with 
the  same  value  as  the  original  pixel,  and  projects  each  subpixel  separately.  Each 
subpixels  contribution  is  proportionally  split  into  the  two  nearest  bins  according  to 
the  distance  between  the  projected  location  and  the  bin  centers.  If  the  subpixel 
projection  intersects  the  center  of  a  bin,  the  bin  has  the  full  value  of  the  subpixel.  If 
the  subpixel  projection  intersects  the  border  between  two  bins,  the  subpixel  value  is 
divided  evenly  between  bins.  The  origin  of  the  RT  and  the  image  axes  is  the  center 
pixel  of  the  image.  This  figure  and  description  is  adapted  from  the  MATLAB®  help 
document  for  the  radon  command. 
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Figure  2.3:  The  RT  of  the  digital  image  f(x,y).  Also  shown  is  the  discrete  RT  for 
6  =  45°.  Note  the  similarity  to  Figure  2.1. 
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equation  (1.2),  can  be  represented  by  its  RT  as 

y)}  =  FR(r ,  0  -  cj>)  =  if  (r,  0).  (2.11) 

This  equation  shows  that  a  rotation  of  an  image  corresponds  to  a  translation  of  the  an¬ 
gular  variable  of  the  RT  [8],  as  in  Figure  2.4.  The  image  /r(x,  y ),  as  in  equation  (1.1), 
can  be  represented  by  its  RT  as 

D\{fr(x,y)}  =  FR(r  —  rx  cos  0  +  sin  0, 9)  =  if  (r,  0).  (2.12) 


Therefore,  translating  an  image  corresponds  to  a  translation  of  the  spatial  variable  of 
the  RT  by  an  amount  that  depends  on  both  the  image  translation,  [rx,Ty],  and  the 
angular  variable,  0,  as  in  Figure  2.5. 

However,  for  a  given  9  the  RT  of  a  translated  image  is  translated  along  the  spatial 
variable.  Certain  choices  of  9  lead  to  direct  estimation  of  [rx,Ty],  as  in  Figure  2.6. 
The  RT  of  fT(x,y )  evaluated  for  9  =  0  is 

if  (r,  0  =  0)  =  FR(r  —  rx  cos(0)  +  ry  sin(0),  0) 

(2.13) 

=  FR(r  -  tx,  0). 

This  choice  of  9  eliminates  the  effect  of  vertical  translation.  Therefore,  the  vector 
nfr  (x,y)}  for  9  =  0  is  translated  along  the  spatial  domain  by  rx  relative  to  the 
vector  D\{f(x,y)}  for  0  =  0. 

The  RT  of  fT(x,y )  evaluated  for  0  =  90°  is 


if  (r,  0  =  90°)  =  FR(r  —  tx  cos(90°)  +  ry  sin(90°),  90°)  ^ 

=  FR(r  +  Ty,  90°). 

This  choice  of  0  eliminates  the  effect  of  horizontal  translation.  Therefore,  the  vec¬ 
tor  9 \{fT(x,y)}  for  0  =  90°  is  translated  along  the  spatial  domain  by  ry  relative  to 
the  vector  9K{f(x,y)}  for  0  =  90°.  The  translations  rx  and  ry  can  be  directly  ex- 
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Figure  2.4:  The  RT  of  the  digital  image  f(x,y )  and  the  rotated  digital  image 

y )  displayed  together  for  visual  comparison.  The  rotated  image  is  by  25  degrees. 
Note  that  the  RT  of  the  rotated  image  is  by  shifted  <f>  =  25°  relative  to  the  original 
image. 
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tracted  using  the  location  of  the  maximum  of  the  cross-correlation  of  9{{f(x,y)}  and 
0\{fT(x,  y)}  for  9  =  0°  and  90°,  respectively. 

Lastly,  the  image  fp(x,y)  can  be  represented  by  its  RT  as 

=  ^FR(Pr,6)  =  FR(r,9).  (2.15) 

Scaling  an  image  corresponds  to  scaling  the  spatial  variable  of  the  RT  and  scaling 
the  intensity  values  [8] .  The  combination  of  these  properties  make  the  RT  a  powerful 
technique  for  image  registration. 

2.1.4  The  Fourier- Mellin  Transform.  The  FMT  is  an  application  of  the 
shifting  property  of  the  FT  combined  with  a  log-polar  coordinate  mapping.  The 
shifting  property  of  the  FT  states  that  a  spatial  shift,  r0,  corresponds  to  a  linear 
phase  change  in  the  spectral  domain  [12] 

${f(x-r0)}  =  F(ejw)e-jwT°.  (2.16) 

The  magnitude  of  the  FT  of  the  f(x  —  rG)  is 

I ${f(x  -  r0)} |  =  \F(e>w)e~iWTo\  =  F(e^)  =  (2.17) 

Therefore,  the  magnitude  of  the  FT  is  translation  invariant. 

This  same  property  can  be  applied  to  both  rotation  and  scaling  if  log-polar  co¬ 
ordinate  mapping  is  used.  Using  a  point  (x,  y )  in  the  real- valued  Cartesian  coordinate 
plane,  the  log-polar  mapping  is 


x  =  cos(0),  (2-18) 

y  =  sin(0),  (2-19) 
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(a)  <R{f(x,y)} 


(b)  D\{fT{x,y)} 


Figure  2.5:  The  RT  of  the  digital  image 
rx  =  10  pixels  and  ry  =  5  pixels,  fT(x,y), 
The  RT  of  the  translated  image  is  warped 
original  image. 


f(x,y )  and  the  same  image  translated  by 
displayed  together  for  visual  comparison, 
along  the  r  axis  relative  to  the  RT  of  the 
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Figure  2.6:  The  RT  the  original  image  f(x,y )  and  the  translated  image  fT(x,y)  for 
specific  values  of  9 ,  showing  that  rx  and  ry  can  be  estimated  from  their  RTs. 
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where  G  9?  is  the  scaling  factor  and  9  G  [0,  2n)  is  the  rotation  angle.  For  every 
H  and  6  there  is  a  unique  point  in  the  Cartesian  plane.  A  change  in  rotation  or 
scale  of  an  image  corresponds  to  a  translation  of  6  or  / 1 ,  respectively.  Therefore,  the 
magnitude  of  the  FT  of  the  log-polar  mapping  is  rotation  and  scaling  invariant  [1]  in 
accordance  with  equation  (2.17). 

The  FMT  is  the  FT  of  the  log-polar  mapping  of  the  magnitude  of  the  FT  of  a 
function,  as  shown  in  Figure  2.7.  The  FMT  of  a  2-D  function  h(x,y)  is 

M{h(x,y)}  =  d{HF(y,9)}  =  Hm(u,  v),  (2.20) 

where  HF(n,9)  is  the  log-polar  mapping  of  the  magnitude  of  the  FT  of  h(x,y).  The 
magnitude  of  the  FT  of  h(x,y)  is  not  represented  directly  in  equation  (2.20)  but  is 
implicit.  The  FMT  is  invariant  to  translation  because  it  uses  the  magnitude  of  the 
FT  of  h(x,y),  which  is  translation  invariant  as  described  above. 

The  magnitude  of  the  FMT  is  invariant  to  both  rotation  and  scale.  The  FMT 
of  the  image  fTtp^{x,  y),  (see  equation  (1.8)),  which  is  translated  by  the  vector  [rx,  ry\, 
scaled  by  (3 ,  and  rotated  by  0,  is 


M{fTM(x,y)}  =  FM( v)  (2.21) 

and  its  magnitude  is 

\F^{u,v)\  =  \FM(u,v)e-juhl0e~jv^\  =  FM(u,v).  (2.22) 

Therefore,  rotation  and  scaling  of  an  image  result  in  linear  phase  shifts  in  the  FMT. 
Notice  that  translation  does  not  appear  in  the  FMT  and  that  neither  rotation  nor 
scale  appear  in  the  magnitude  of  the  FMT,  thus  showing  that  the  FMT  is  invariant 
to  translation,  rotation,  and  scale. 
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h(x,y) 


H  (u,v) 

Figure  2.7:  Flow  chart  for  computing  the  FMT  for  the  2-D  function  h(x,y). 

2.2  Multi- Reference  Frame  Image  Registration 

Multi-reference  frame  image  registration  aims  to  minimize  or  eliminate  image 
biasing  and  to  be  more  robust  in  noisy  environments  than  traditional  single  reference 
frame  registration.  A  multi- reference  frame  image  registration  technique  for  esti¬ 
mating  translational  shifts  uses  the  location  of  the  maximum  of  the  two-dimensional 
cross-correlation  as  the  initial  estimator.  The  initial  estimator  is  the  single  refer¬ 
ence  frame  registration  technique  used  to  estimate  the  relative  distortion  between  all 
combinations  of  two  images  [2], 

An  initial  reference  image  is  chosen,  and  all  images  in  the  set  of  N  images 
are  compared  to  the  reference  image  to  obtain  relative  distortion  estimates.  Each 
reference  image  results  in  a  different  set  of  distortion  estimations,  as  in  Figure  2.8. 
This  procedure  is  repeated  for  all  images  in  the  set  using  a  different  reference  im¬ 
age  each  time.  To  ensure  comparable  distortion  estimates,  each  collection  of  relative 
estimates  from  a  particular  reference  image  are  normalized  to  zero  mean.  Final  dis¬ 
tortion  estimates  are  made  by  averaging  the  relative  distortion  estimates  with  respect 
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Figure  2.8:  The  result  of  distortion  estimation  using  different  reference  frames.  The 
estimations  are  made  from  5  frames  that  have  random  translative  distortion. 


to  dimensionality  and  reference  frame,  as  in  Figure  2.9.  This  averaging  requires  N2 
estimations,  which  implies  significantly  more  calculations  then  single  reference  frame 
registration  [2], 

The  process  is  applied  with  a  model  using  a  N  x  N  matrix  with  each  element  of 
the  matrix  corresponding  to  one  distortion  estimation.  Let  Cl{i,  j}  be  the  estimation 
of  some  distortion  between  image  %  and  image  j.  The  matrix  of  multi-reference  frame 
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Figure  2.9:  The  overlapped  results  of  translation  estimation  using  each  image  as 

a  reference.  This  figure  highlights  the  differences  in  estimation  when  using  different 
reference  frames.  Multi-frame  registration  combines  all  data  to  compute  the  distortion 
estimation  by  averaging  the  cluster  of  estimations.  The  means  of  the  clusters  are  the 
final  distortion  estimates  for  a  given  frame. 
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estimations  is 


0{1, 1} 

0(1,2}  •• 

•  n{i,N} 

^{2,1} 

0(2,  2} 

Cl{2,N} 

&{N,  1} 

fl{N,2}  ■■ 

■  fl{N,N} 

(2.23) 


Simplifications  can  be  made  using  symmetry.  Specifically,  the  distortion  between  an 
image  and  itself  is  zero  or  one, 


r2{z,  i}  =  0  or  1. 


(2.24) 


The  distortion  is  zero  for  translation  or  rotation,  and  one  for  scaling.  Also,  switching 
the  reference  frame  with  the  frame  to  be  registered  results  in  the  same  distortion 
estimate  only  negated  or  inverted, 


or 


(2.25) 


The  estimate  is  negated  for  translation  or  rotation,  and  inverted  for  scaling.  These 
simplifications  reduce  the  number  of  estimations  to  ( N 2  —  N)/ 2.  and  reduce  equa¬ 
tion  (2.23)  to 


n  = 


0  0{  1 ,  2} 

— h2{l,  2}  0 

-Cl{2,N} 


Cl{l,N} 

f}{2,iV} 


or 


1  0{1, 2} 


i 


0(1,2} 


Cl{l,N} 
f2{2,  iV} 


0(1,  N}  0(2  ,N} 


(2.26) 
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Another  consequence  of  these  simplifications  is  that  the  initial  distortion  estimates  do 
not  need  to  be  normalized  because  the  sum  of  all  values  in  each  matrix  is  zero.  The 
final  vector  of  distortion  estimates  is  calculated  by  averaging  the  columns  or  rows  of 
the  matrix  of  multi-reference  frame  estimations  [2]: 

N 

Cl(m,  n) 

nM(m)  =  - ,  Vm  e  {1,2, ,  N}.  (2.27) 

This  multi-reference  frame  registration  method  is  independent  of  the  distortion 
model  and  can  account  for  prior  knowledge  of  the  distortion  by  using  a  different  initial 
estimator.  Multi-frame  registration  is  an  unbiased  estimator  if  the  initial  estimator 
used  is  unbiased.  It  is  shown  that  the  multi-reference  frame  registration  for  shift 
estimation  (using  cross-correlation  as  the  initial  estimator)  performs  better  than  single 
reference  frame  cross-correlation  [2], 

2.2.1  Optimal  Gauss- Markov  Estimator.  The  new  multi-reference  frame  es¬ 
timation  technique  is  equivalent  to  the  optimal  Gauss-Markov  estimator  for  distortion 
estimation,  as  in  [13].  In  [13],  the  Gauss-Markov  estimator  is  used  for  time  difference 
of  arrival  estimation  in  radio  navigation  given  all  pairwise  estimations  of  delay.  The 
Gauss-Markov  estimate  for  the  vector,  D,  of  IV  —  1  true  distortions  relative  to  the 
first  image  is 

D  =  [ATPE]A]~lATDMl  (2.28) 

where 

D{  1,  k}  =  [ft{l,  k}}T,  for  k  G  {2, ... ,  N},  (2.29) 

Dm  is  the  vector  of  (IV2  —  N)/ 2  distortion  estimates  for  all  N  images  taken  two  at  a 
time  or 


DM{i,j}  =  [«{!,  2},  fi{l,  3},  •  •  •  ,  n{l,  N},  G{2,  3},  •  •  •  ,  Q{N  -  1, 1V}]T,  (2.30) 
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Pe  is  the  normalized  error  covariance  of  the  distortion  estimates  which  for  independent 
and  identically  distributed  estimates  is  equal  to  the  identity  matrix  of  size  (N2  —  N)/2 
or 

A  =  5{j,  k}  -  5{i,  k},  (2.31) 

and  6  is  the  Kronecker  delta  function 


<Hb  3} 


1,  for  i  —  j 
0,  for  i  7^  j 


(2.32) 


Using  the  multi-frame  estimates,  CIm,  it  can  be  shown  that  the  Gauss-Markov  esti¬ 
mator  is  equal  to  the  multi-frame  estimates  with  respect  to  the  first  image 


D  —  ClM(  1) 


Gm(  2) 
ClM(N) 


(2.33) 


Appendix  A  provides  an  example  of  the  calculations  that  lead  to  the  result  in  equa- 


III.  Experimental  Research  Methodology 


The  basic  methodology  for  the  experiments  developed  in  this  thesis  involves  single 
and  multi-reference  frame  registration  using  a  set  of  N  images.  Here,  single 
reference  frame  registration  is  also  called  the  single  frame  method,  and  multi-reference 
frame  registration  is  also  called  the  multi-frame  method.  The  set  of  N  images  are 
multiple  images  of  a  true  object  taken  at  different  times  and/or  perspectives,  which 
results  in  relative  geometric  distortions  of  translation,  rotation,  and/or  scale.  The 
final  results  of  registration  are  estimates  of  the  relative  distortions  and  an  estimate  of 
the  true  object  or  scene  for  both  the  single  and  multi-frame  methods.  The  goal  is  to 
compare  the  quality  of  the  scene  estimate  and  the  performance  of  the  methods  used 
for  image  registration. 

Comparisons  of  the  single  and  multi-frame  methods  are  made  for  different  com¬ 
binations  of  geometric  distortions  using  appropriate  registration  algorithms  to  es¬ 
timate  and  remove  the  distortions.  The  first  experiment,  focuses  on  data  that  has 
relative  translation  and  uses  cross-correlation  to  register  the  data  as  in  [2] .  The  second 
experiment,  uses  data  distorted  translationally  and  rotationally  and  registered  using 
the  RT  and  cross-correlation.  The  last  experiment,  uses  images  translated,  rotated 
and  scaled,  and  registered  using  the  FMT  and  phase  correlation.  For  each  experiment 
a  simulated  data  set  is  used  for  comparing  the  performance  of  single  and  multi-frame 
registration. 

3.1  Simulated  Experiment  Setup 

Simulated  data  is  used  to  compare  the  single  and  multi-frame  methods.  Initially, 
an  over-sampled  image  (1024  by  1024  pixels)  is  converted  to  a  grayscale  image,  J(x,  y). 
with  magnitude  values  from  0  to  255.  The  grayscale  image  is  randomly  distorted 
N  times  using  a  mathematical  model,  found  in  section  1.1.1,  to  simulate  geometric 
distortions  from  capturing  multiple  images  of  the  same  object.  This  procedure  results 
in  N  images  that  are  randomly  distorted  relative  to  a  true  image,  In(x,y,n),  and 
a  vector  of  the  true  distortion,  Q(n).  The  grayscale  undistorted  image  is  the  true 


29 


image.  Next,  Gaussian  white  noise  is  added  to  all  images.  Interpolation  is  required  to 
simulate  rotation  and  scale.  Also,  to  correct /remove  rotation  or  scale  interpolation  is 
required,  although  this  need  is  not  unique  to  simulated  data.  The  same  interpolation 
techniques  are  used  for  the  single  and  multi-frame  methods. 

Thus  the  simulated  data  consists  of  N  distorted,  noisy  images  and  one  undis¬ 
torted,  noiseless  image  (the  true  image).  An  unrealistic  limitation  of  simulated  data 
is  that  translation  can  only  be  an  integer  number  of  pixels  unless  computationally 
intense  interpolation  is  used.  In  reality,  multiple  images  of  the  same  target  has  non¬ 
integer  pixel  shifts  between  frames.  To  account  for  sub-pixel  shifts,  the  simulated 
data  is  downsampled.  Downsampling  also  has  the  benefit  of  reducing  the  computa¬ 
tional  time  needed  for  registration.  Downsampling  yields  the  final  simulated  data, 
fa(x,y,ri)  +  n(x,y,n),  consisting  of  N  noisy  images  that  are  randomly  and  realisti¬ 
cally  distorted  between  frames  and  one  undistorted,  noiseless  image,  f(x,y),  (the  true 
image),  all  of  which  are  256  by  256  pixels. 

The  simulated  data  is  registered  using  single  and  multi-frame  methods.  The 
results  of  registration  are  distortion  estimates,  D(n),  and  a  scene  estimate,  f(x,y). 
The  root  mean  square  error  (RMSE)  of  the  distortion  estimates  is  computed  and 
the  signal-to-noise  ratio  (SNR)  of  the  scene  estimate  is  computed.  The  SNR  and 
the  RMSE  provide  a  comparison  of  the  performance  of  the  single  and  mutli-frame 
registration  methods.  Figure  3.1  provides  a  flow  of  the  methodology  used  for  the 
simulated  experiments. 

3.1.1  Metrics.  Appropriate  metrics  are  required  to  compare  the  multi-frame 
and  single  frame  methods  for  the  simulated  experiments.  The  SNR  is  a  popular  image 
quality  metric;  it  is  the  ratio  of  signal  power  to  noise  power.  The  SNR  of  the  true 
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image,  f(x,y),  and  its  estimate,  f(x,y),  is 


SNR(fJ) 


N  M 


EE 

72—1  772—1 


f(n,  m) 


1 

RS 


R  S 


N  M 

EE  f(n,  m)  —  f(n,  m) 

U=  1  772—1 


(3.1) 


The  scene  estimate,  f(x,  y),  is  the  result  of  averaging  the  N  images  that  are  commonly 
aligned  using  the  distortion  estimates,  0(n).  However,  the  scene  estimates  and  the 
true  image  are  not  commonly  aligned  and,  therefore,  the  scene  estimates  are  registered 
and  aligned  to  the  true  image  before  calculating  the  SNR. 

The  RMSE  is  a  standard  metric  for  measuring  registration  accuracy.  For  the 
simulated  data,  the  geometric  distortions  are  artificially  applied  to  the  images,  and, 
therefore,  the  true  values  are  known.  The  distortion  estimates  from  the  two  methods 
can  be  compared  with  the  known,  true  distortion  values  when  using  simulated  data. 

The  distortion  estimates  from  the  two  methods  cannot  be  directly  compared 
to  the  true  distortion  values  because  the  true  values  and  estimated  values  have  a 
different  reference  point  from  which  they  are  measured.  Therefore,  the  differences  of 
true  and  estimated  values  are  used  to  compute  the  error.  A  difference  matrix  for  the 
true  distortion,  O(n),  is  computed,  where  each  element  corresponds  to  a  difference 
between  distortion  values, 


AO 


0(1)  -0(1) 

O(iV)  —  0(1) 


0(1)  —  0(A) 

O (N)  -  O (N) 


(3.2) 


and  similarly  for  the  estimates,  AO.  These  matrices  are  compared  using  the  RMSE. 
The  RMSE  between  the  known  distortion  differences,  AO,  and  the  estimated  distor- 


31 


tion  differences,  Ail,  is 


RMSE(  Ail,  Ail) 


N  N 


yy  (Ail(n,  m)  —  Af2(n,  m)Y 


\ 


71=  1  tti=1 


A2 


(3.3) 


However,  the  distortion  differences  have  zeros  along  the  diagonal,  which  creates  a 
bias  in  the  RMSE.  To  remove  this  bias,  the  zeros  along  the  diagonal  of  the  difference 
matrix  are  not  used  in  the  calculation  of  the  RMSE  leaving  only  N(N  —  1)  calculations 
and  resulting  in  a  new  RMSE 


RMSE  (Ail,  Ail) 


N 


N  N 

yy  £(AIl(n,m)  —  Ail (n,m))2 

71=1  771=1 
m^n 

A(A-  1) 


(3.4) 


Also,  the  error  is  calculated  using  ( N  —  l)2  more  calculations  then  needed  because 
the  vector  of  N  estimates  is  used  to  create  a  matrix  of  N2  estimates  .  Therefore,  the 
actual  RMSE  is  calculated  using  y^r^y/RMSE  or  y^y^RMSE.  The  RMSE  of  the 
distortion  estimates  is 


RMSE  (Aft,  Ail) 


N  N 

yy  yy  (Ail(n,  m)  —  AQ(n,  m))2 

\  71=1  771=1 

\  rriy^n 

~  (JV-1) 


(3.5) 


3.2  Experiment  for  Translative  Distortion 

This  section  further  develops  the  work  of  Bruckart  [2]  in  the  area  of  multi-frame 
image  registration  for  translation  only. 

The  methodology  for  the  experiment  for  translation  applies  the  spectral  im¬ 
plementation  of  the  2-D  cross  correlation,  equation  (2.6).  The  model  fT(x,y),  as  in 
equation  (1.1),  is  used  to  simulate  the  translative  distortion.  The  translation  [tx ,  ry] 
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I(x,y) 


Figure  3.1:  Process  flow  in  the  simulated  experiments  for  analyzing  the  perfor¬ 

mance  of  single  and  multi-frame  image  registration  methods.  The  chart  outlines  the 
process  of  using  an  oversampled  gray  scale  image,  I(x,y),  to  create  a  simulated  data 
set  of  randomly  distorted  images  and  to  register  them  using  single  and  multi-frame 
registration  methods. 
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5 


10 


Figure  3.2:  The  2-D  cross  correlation  of  the  image  f(x,y),  Figure  2.3(a),  and  the 
same  image  translated  by  tx  =  10  pixels  and  ry  =  5  pixels,  fT(x,y).  The  location  of 
the  maximum  is  indicated  by  the  labelled  markers,  which  correspond  to  the  estimates 
fx  and  fy,  in  accordance  with  equation  (3.6). 


is  estimated  from  the  location  of  the  maximum  of  the  cross-correlation  of  f(x,y)  and 
fr(x,y) 


[fx,fy]  =  argma xC{f(x,y),fT(x,y)}, 

x,y 


(3.6) 


as  in  Figure  3.2.  After  the  estimates  for  translation  are  computed,  the  distortion  can 
be  removed.  Once  the  shifts  are  removed  the  images  are  all  commonly  aligned  and  can 
be  averaged  together,  which  reduces  the  noise.  This  process  is  image  averaging.  The 
registration  is  performed  using  both  single  and  multi-frame  methods,  resulting  in  two 
reconstructed  images,  or  scene  estimates,  of  the  true  object.  The  goal  is  to  compare 
the  quality  of  the  two  scene  estimates  and  the  performance  of  the  multi-frame  and 
single  frame  methods. 

3.3  Experiment  for  Translative  and  Rotational  Distortion 

The  methodology  for  the  experiment  for  translation  and  rotation  combines  nor¬ 
malized,  windowed  1-D  cross  correlation  in  the  spatial  domain  and  properties  of  the 
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RT.  As  stated  in  section  2.1.3,  translations  in  an  image  correspond  to  a  translation  in 
the  spatial  parameter  of  the  RT  as  a  function  of  angle  parameter,  as  in  equation  (2.12). 
Also,  rotation  in  an  image  corresponds  to  a  translation  in  the  angle  parameter  of  the 
RT,  as  in  equation  (2.11).  An  effective  registration  method  is  developed  using  a 
combination  of  these  properties  for  images  that  are  rotated  and  translated. 

Several  papers  use  the  RT  or  a  combination  of  the  RT  and  other  methods  for 
image  registration  [8]  [9]  [10]  [11]  [14].  However,  few  of  those  papers  explicitly  and 
properly  deal  with  registering  images  that  are  rotated  and  translated.  The  few  that 
do  so  use  images  that  contain  a  white  shape  on  a  black  background,  have  different 
textures,  or  contain  lines  or  borders,  all  of  which  are  cases  where  the  RT  excels  in 
discriminating  translation  and/or  rotation.  When  realistic  images  are  used  these 
methods  do  not  work  accurately  for  a  wide  range  of  images.  Therefore,  it  is  necessary 
to  develop  a  novel  approach  for  rotation  and  translation  estimation  using  the  RT  for 
realistic  images. 

A  model  for  rotated  and  translated  images  is  needed  to  develop  the  new  method. 
There  is  a  difference  in  the  model  for  an  image  that  is  first  translated  and  then  rotated 
versus  an  image  that  is  first  rotated  and  then  translated,  as  indicated  in  section  1.1.1. 
The  2-D  image  fT,<j>(x,y),  equation  (1.4),  is  represented  by  its  RT  as 

M{fT,<t,(x>y)}  =  FR{r-Tx  cos(0  -  <j>)  +  Tvsm(d  -  </>),0  -  0)  =  F^(r,6).  (3.7) 

The  2-D  image  f^,T(x,y),  equation  (1.5),  can  be  represented  by  its  RT  as 

M{U,r(x>y)}  =  FR(r  —  t'x  cos(9  -  (f))  +  TySm(9  -  <j>),6  -  0)  =  F^T(r,9),  (3.8) 

where 


:  t'x  cos  0  +  Ty  sin  0, 

(3.9) 

—t'x  sin  4>  +  Ty  cos  0, 

(3.10) 
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and 


T„  =  rr  cos  0  —  r„  sin  ( 


Ty  —  Tx  sin  0  +  Ty  cos  < 


(3.11) 

(3.12) 


The  vector  [rx,ry}  is  the  rotational  transformation  of  the  vector  [t',t'].  The  order  of 
translation  and  rotation  does  not  matter  for  registering  the  images,  but  if  the  param¬ 
eters  0  and  [tx ,  ry]  are  to  be  estimated,  then  the  order  is  important.  For  simplicity, 
equation  (3.7)  is  used  as  the  model. 

This  model  is  developed  as  an  iterative  algorithm  to  estimate  the  relative  trans¬ 
lation  and  rotation  between  a  pair  of  images.  To  estimate  rx,  let  9  —  0  —  0°  and 


F^(r, 0  =  0)  =  FR(r  ~  Tx  cos(0)  +  Ty  sin(0),  0) 
=  FR[r  -  tx,  0). 


(3.13) 


Therefore,  if  0  is  known  then  the  vector  9t{fT,4>(x,y)}  for  d  =  0  is  translated  along 
the  spatial  domain  by  tx  relative  to  the  vector  9${f{x,y)}  for  9  =  0.  The  translation 
is  estimated  using  the  maximum  of  the  1-D  normalized,  windowed  cross-covariance 
between  the  pair  of  vectors 


fx((p)  =  argmaxn"(m){FR(r,O),FT^(r,0  =  0)}. 


(3.14) 


To  estimate  ry,  let  9  —  0  =  90°  and 

FrF  (r,  9  =  90°  +  0)  =  FR(r  —  tx  cos(90°)  +  ry  sin(90°),  90°) 

=  FR(r  +  Ty ,  90°). 


(3.15) 


Therefore,  if  0  is  known,  then  the  vector  91{/T^(x,  y)}  for  9  =  90°  +  0  is  translated 
along  the  spatial  domain  by  Ty  relative  to  the  vector  9{{f(x,y)}  for  9  =  90°.  The 
translation  is  again  estimated  using  the  maximum  of  the  1-D  normalized,  windowed 
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cross-covariance  between  the  pair  of  vectors 


Ty($)  =  argma xn”(m){FR(r,  90°),  F^(r,  6  =  90°  +  $)}.  (3.16) 

Since  </>  is  unknown,  the  previous  estimations  cannot  be  computed  directly.  How¬ 
ever,  with  an  iterative  approach  the  rotation  can  be  accurately  estimated.  Performing 
the  previous  steps  for  a  range  of  rotation  values  results  in  a  set  of  maxima  of  1-D  nor¬ 
malized,  windowed  cross-covariances  that  correspond  to  the  range  of  rotation  values, 
as  in  Figures  3.3  and  3.4.  The  location  of  the  highest  maxima  gives  the  estimate  for 
the  rotation 

4>  —  arg  max  M(<j>) ,  (3-17) 

where 

MM  =  maX«”(m){FH(r,0),F5(r,0  =  *)}  (3,18) 

and  (j)  ranges  from  —cj)max  to  4>max  incremented  by  Acp,  as  in  Figure  3.5.  Using  the 
estimate  of  (j)  from  equation  (3.17),  the  parameters  rx  and  ry  can  then  be  accurately 
estimated  as  described  in  equations  (3.14)  and  (3.16). 

The  1-D  normalized,  windowed  cross-covariance  is  used  to  estimate  the  distor¬ 
tion  parameters,  because  other  correlation  techniques  are  ineffective  for  a  wide  range 
of  digital  images  when  correlating  using  the  RT,  primarily  due  to  the  fact  that  the 
RT  of  most  digital  images  varies  drastically  in  magnitude.  Normalizing  eliminates 
the  effect  of  varying  magnitude  and  windowing  reduces  the  number  of  computations 
and  can  increase  the  accuracy  of  correlation.  Another  important  design  consideration 
when  using  the  RT  is  the  need  for  circular  windowing  of  the  images.  Digital  images 
have  rectangular  boundaries,  and  the  boundaries  create  a  fixed  pattern  in  the  RT, 
Figure  3.6.  Circular  windowing  eliminates  the  fixed  pattern  caused  by  the  boundaries 
of  the  image,  as  in  Figure  3.7. 

This  iterative  algorithm  requires  some  criteria.  First,  the  range  of  rotation 
values  must  be  wide  enough  to  account  for  the  maximum  possible  rotational  difference 
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(a)  (j)  =  0°  (b)  Aligned  for  Maximum  Correlation 


(c)  (ft  =  5°  (d)  Aligned  for  Maximum  Correlation 


Figure  3.3:  The  RTs  FR(r ,  0°)  and  FR^(r,  9  =  (ft)  for  two  values  of  (ft,  showing  that 
tx  can  be  estimated  using  the  RT.  The  image  f(x,y),  Figure  2.3(a),  is  translated  by 
tx  =  10  pixels  and  ry  =  5  pixels  then  rotated  by  (ft  =  5°.  When  (ft  =  (ft  the  estimation 
of  tx  is  easily  seen  because  ^{fT,<j>{x,y)}  is  translated  by  rx  relative  to  93{/(x,t/)}  in 
accordance  with  equation  (3.13). 
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(a)  (j)  =  10°  (b)  Aligned  for  Maximum  Correlation 


(c)  (j>  =  15°  (d)  Aligned  for  Maximum  Correlation 


Figure  3.4:  The  RTs  FR(r,  0°)  and  F^^r,  6  =  0)  for  another  two  values  of  0,  show¬ 
ing  that  if  0  7^  0  then  the  RTs  do  not  correlate  well.  The  image  f(x,  y ),  Figure  2.3(a), 
is  translated  by  rx  =  10  pixels  and  ry  =  5  pixels  then  rotated  by  0  =  5°,  illustrat¬ 
ing  that  the  RT  algorithm  can  accurately  estimate  the  translation  because  the  RTs 
correlate  best  when  0  =  0. 
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Figure  3.5:  Maximum  of  the  1-D  normalized,  windowed  cross-covariance  between 
FR(r ,  0°)  and  FR^(r,  9  —  <p)  for  a  range  of  0;  see  equation  (3.18).  The  estimate  of  </>  is 
the  value  of  0  at  the  maximum  of  M(</>);  see  equation  (3.17).  The  estimated  rotation, 
</>,  is  5  degrees,  which  equals  the  true  0. 
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(a)  b(x,y) 


(c)  b<j,(x,y) 


0  ( degrees ) 


(b)  VK{b{x,y)} 


0  ( degrees ) 

(d)  fR{b^(x,y)} 


Figure  3.6:  The  RT  of  an  image  of  a  brick  wall,  b(x,  y ),  and  the  same  image  rotated 
by  30  degrees,  b^x^y).  Note  that  the  RTs  both  have  a  similar  diamond-shaped 
pattern,  which  masks  the  translation  along  the  0-axis  due  to  the  rotation. 
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(a)  Windowed  b(x,  y) 


(c)  Windowed  b<j,(x,y) 


0  ( degrees ) 


(b)  Windowed  9\{b(x,y)} 


0  (degrees) 

(d)  Windowed  y)} 


Figure  3.7:  The  RT  of  a  windowed  image  of  a  brick  wall  and  the  same  windowed 
image  rotated  by  30  degrees.  With  windowing  the  diamond  pattern  is  removed  be¬ 
cause  the  edges  of  the  image  no  longer  contribute  to  the  magnitude  of  the  RT,  so  the 
translation  due  to  rotation  is  more  obvious. 
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between  any  two  images  in  the  set.  Second,  the  1-D  normalized,  windowed  cross¬ 
covariance  requires  that  the  assumed  maximum  translation  for  each  dimension  be 
large  enough  to  account  for  the  maximum  possible  translational  difference  between 
any  two  images  in  the  set.  Thus,  registration  always  fails  for  rotation  and  translation 
above  the  maximum  assumed  value. 

After  the  estimates  for  translation  and  rotation  are  computed,  the  distortions 
can  be  removed,  thus  aligning  the  images.  The  aligned  images  are  averaged  to  create  a 
scene  estimate.  There  is  a  scene  estimate  for  both  single  and  multi-frame  registration. 
The  goal  is  to  compare  the  quality  of  the  two  scene  estimates  and  the  performance  of 
the  registration  methods. 

3-4  Experiment  for  Translation,  Rotation,  and  Scale  Distortion 

The  methodology  for  the  experiment  for  translation,  rotation,  and  scale  com¬ 
bines  the  FMT  and  phase  correlation.  As  discussed  in  section  2.1.4,  a  scale  and/or 
rotation  corresponds  to  linear  shifts  in  the  phase  of  the  FMT,  as  in  equation  (2.21). 
Also,  translations  in  a  image  correspond  to  linear  shifts  in  the  phase  of  the  FT,  as 
in  equation  (2.16).  An  effective  registration  method  has  been  developed  by  Adam 
Wilrner  using  these  properties  and  is  used  for  registering  images  that  are  translated, 
rotated,  and  scaled. 

The  model  as  in  equation  (1.8),  is  used  to  simulate  the  distortion 

for  this  experiment.  The  algorithm  computes  the  FMT  of  two  images  and  uses  phase 
correlation  to  estimate  the  relative  rotation  and  scale,  which  is  accomplished  by  first 
computing  the  normalized  cross-power  spectrum 


3{FF(ii,6)}Z{FTFM(»,0)y 

\3{FF(n,e)}3{F?M(ji,e)}* 


(3.19) 


43 


and,  using  equation  (2.21),  the  normalized  cross-power  spectrum  is 


FM(u,  v)FM(u,  v)*e^u hl9+v<P) 


| FM{U,  V)FM(U,  Vyej{n\nl3+v<j>)  | 

FM(u,v)FM(u,v)*ej^ulnl3+v^  (3.20) 


|  FM(u,v)FM(u,v)* 

ej(u  In  &+v(t>) 


The  scale  and  rotation  are  estimated  from  the  maximum  of  phase  correlation 


[In  A  (p\  =  argma xP{FF(fi,9),  Ffp  <p(fi,9)} 

[1,0 

=  arg max $~1{p{FF (/i,  9),  F 0)}} 
[1,0 

=  arg  max  5“ 1  {e^“ ln  9+v,t>) } 

fj.,9 


(3.21) 


=  arg  max  <5(/i  +  ln  (3,  6  +  (ft) , 


as  in  Figure  3.8.  The  phase  correlation  of  the  FMT  in  Figure  3.8  does  not  result  in  a 
delta  function  because  the  log-polar  mapping  for  discrete  functions  requires  interpo¬ 
lation. 

The  log-polar  mapping  can  have  a  resolution  different  from  the  original  images. 
The  resolution  of  the  log-polar  mapping  determines  the  logarithmic  spacing  of  the 
translation  due  to  scaling  and  the  proportional  spacing  of  the  translation  due  to 
rotation.  For  scaling,  if  the  log-polar  mapping  has  a  resolution  of  256  pixels  for 
the  scale  axis,  the  pixels  correspond  to  logarithmically  spaced  values  from  log(l) 
to  log(128).  For  this  algorithm,  the  logarithmic  spacing  only  extends  to  half  the 
resolution  of  the  scale  axis  because  beyond  half  the  resolution  the  scale  is  less  than 
one.  For  rotation,  if  the  log-polar  mapping  has  a  resolution  of  512  pixels  for  the 
angular  axis  then  the  translation  of  a  pixel  corresponds  to  ~  0.703°  per  pixel. 

The  relative  rotation  and  scale  are  removed  from  the  image  that  is  reduced 
in  scale.  The  smaller  scaled  image  is  used  to  preserve  the  size  of  the  images  after 
each  is  adjusted  for  rotation  and  scale.  The  images  now  only  have  relative  translative 
distortion.  The  translation  is  estimated  using  phase  correlation  of  the  images  after  the 
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Figure  3.8:  Phase  correlation  of  the  FMT  of  f(x,y ),  as  in  Figure  2.3(a),  and  the 
FMT  of  the  same  image  translated  by  tx  =  10  pixels  and  ry  =  5  pixels,  scaled  by 
/ 3  =  1.1,  and  rotated  by  0  =  5°,  fT,p,q i(x,  y ).  The  location  of  the  maximum  is  indicated 
by  the  labelled  markers,  which  correspond  to  the  estimates  [3  and  0,  in  accordance 
with  equation  (3.21).  The  pixels  of  the  vertical  axis  correspond  to  logarithmically 
spaced  values  of  scale  and  the  labelled  marker  for  scale  reflects  this  spacing.  The 
pixels  of  the  horizontal  axis  correspond  to  equal  proportions  of  360°. 


rotation  and  scale  have  been  removed  from  the  smaller  scaled  image.  The  cross-power 
spectrum  of  f(x,y)  and  fT(x,y)  is 


p{f(x,y),fT(x,y)} 


Ff(u,v)F?(u,v)* 

| FF(u,  v)FF(u,  u)*| 

Ff(u ,  v)FF(u,  v)*ej(UT*+VTv > 
| FF(u,  v)FF(u,  v)*e^UTx+VTy'> 

ej(UTx+VTy ) 


(3.22) 


The  relative  translation  is  also  estimated  using  the  maximum  of  phase  correlation 


[tx,ty}  =  argma xP{f(x,y),fT(x,y)} 


x,y 


=  arg  max  $  {p{f(x,y),fT(x,y)}} 

x,y 


-1  5  pi  {^TX+VTy) 


} 


=  arg  max  $  (e 
x,y 


=  arg  max  8(x  +  rx,y  +  ry), 
x,y 


(3.23) 


as  in  Figure  3.9.  The  translative  distortions  are  then  removed,  aligning  the  images. 
The  aligned  images  are  averaged  to  create  a  scene  estimate.  There  is  a  scene  estimate 
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Figure  3.9:  Phase  correlation  of  the  FMT  of  f(x,y )  and  the  FMT  of  fT,p,tj>  after  it 
has  been  adjusted  for  rotation  and  scale  by  the  estimates  labelled  on  Figure  3.8.  The 
location  of  the  maximum  is  indicated  by  the  labelled  markers,  which  correspond  to 
the  estimates  tx  and  fy,  in  accordance  with  equation  (3.23). 

for  both  single  and  multi-frame  registration.  The  goal  is  to  compare  the  quality  of 
the  two  scene  estimates  and  the  performance  of  the  registration  methods. 


46 


IV.  Results  and  Analysis 


For  each  experiment,  statistics  on  the  accuracy  and  quality  of  the  registration 
methods  are  generated  using  multiple  simulated  data  sets.  Each  simulated  data 
set  is  defined  by  the  model  used  to  generate  distortions,  the  maximum  value  of  those 
distortions,  the  variance  of  the  Gaussian  white  noise,  and  the  number  of  simulations 
performed  to  generate  the  statistics.  Table  4.1  summarizes  the  parameters  for  each 
experiment.  The  random  translational  and  rotational  distortion  are  generated  using 
uniform,  zero-mean  random  generators  from  —rmax  to  Tmax  and  —(j)max  to  (ftmax ,  re¬ 
spectively.  For  the  experiment  using  the  RT,  the  rotation  estimates  depend  on  the 
increments  of  rotation  values,  Acj),  as  in  equation  (3.18).  The  random  scaling  is  gen¬ 
erated  using  unform  random  generators  from  1  to  some  maximum  scale,  f3max.  The 
same  true  image  is  used  for  each  experiment  to  provide  comparable  results.  The  true 
image  is  from  NASA  and  is  a  satellite  image  of  crops  in  Kansas,  United  States  [15]. 
Each  experiment  produces  relatively  similar  results,  therefore,  the  majority  of  the 
analysis  is  applied  to  all  experiments. 

The  cross-correlation  experiment  has  a  large  noise  variance  to  demonstrate  the 
robustness  of  cross-correlation  against  noise.  The  added  noise  is  random  and  unlikely 
to  correlate.  Also,  with  only  translative  distortion  the  experiment  is  relatively  simple. 
The  decreased  noise  variance  for  the  FMT  experiment  is  because  the  algorithm  is  not 
designed  to  handle  noisy  data.  The  algorithm  is  only  designed  as  a  demonstrative 


2-D  Cross- 

Radon 

Fourier-Mellin 

Correlation 

Transform 

Transform 

Experiment 

Experiment 

Experiment 

Mathematical  Model 

fr(x,y) 

frAX’V) 

fr,<j>,fi(Xj  2/) 

Noise  Variance 

60 

20 

10 

#  of  Simulations 

1000 

10000 

10000 

Max  Translation,  2 rmax 

25  pixels  (10%) 

16  pixels  (6%) 

16  pixels  (6%) 

Max  Rotation,  20max 

16° 

16° 

Rotation  Increment,  A 0 

1° 

Max  Scale,  /3max 

130% 

Table  4.1:  Summary  of  the  simulation  parameters  used  for  each  experiment. 
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tool  for  the  applications  of  the  FMT  and,  therefore,  performs  poorly  with  noisy  data. 
This  result  does  not  indicate  that  the  FMT  is  a  poor  technique  for  noisy  data,  but 
rather  that  only  this  particular  implementation  of  the  FMT  is  a  poor  technique  for 
noisy  data. 

The  results  show  an  obvious  improvement  in  both  registration  accuracy  and 
quality  of  the  scene  estimate  for  the  multi-frame  method;  see  Figures  4.1-  4.12.  This 
result  is  expected  because  an  unbiased  registration  method,  on  average,  performs 
better  than  a  biased  method.  More  interesting  is  the  non-linear  improvement  in  reg¬ 
istration  accuracy  for  the  multi-reference  frame  estimator  with  respect  to  using  more 
frames.  The  registration  accuracy  and  quality  statistics  are  computed  for  different 
frame  set  sizes  as  indicated  by  the  independent  axis  of  the  figures.  The  goal  is  to 
analyze  how  the  methods  perform  when  more  frames  are  added.  However,  the  statis¬ 
tics  for  different  frame  set  sizes  are  not  independent,  which  is  intentional.  For  one 
simulation,  the  metrics  are  computed  for  3  frames  then  the  same  3  frames  are  used 
with  another  frame  added,  the  metrics  re-computed,  and  so  on.  The  idea  is  that  more 
data  should  produce  better  results. 

It  is  known  that  the  single  frame  method  is  not  more  accurate  with  more  frames, 
which  is  supported  by  the  results.  This  effect  is  a  byproduct  of  the  bias  of  the  single 
frame  method,  which  highlights  a  key  benefit  of  the  multi-frame  approach.  The  multi¬ 
frame  method  averages  all  the  available  frames,  resulting  in  better  accuracy  because 
there  is  no  bias.  The  non-linear  improvement  of  the  multi-frame  method  is  not  only 
because  it  is  unbiased,  but  also  because  with  more  frames  it  averages  all  combinations 
of  frames,  which  provides  a  boost  in  performance  when  more  frames  are  used.  With  N 
frames  the  single  frame  method  uses  only  N  —  1  estimations.  The  multi-frame  method 
averages  over  (iV2  —  N)/ 2  estimations.  This  averaging  over  more  estimations  results 
in  less  error  and  gives  the  multi-frame  method  the  capability  of  subpixel  estimation. 

The  multi-frame  method  produces  a  better  scene  estimate  as  a  result  of  the 
improved  registration  accuracy.  The  gain  in  image  quality  is  less  drastic  for  the  cross- 
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correlation  experiment  because  both  methods  are  accurate  to  less  than  one  pixel.  It 
may  seem  exceptional  that  the  image  quality  of  the  scene  estimates  from  the  single 
frame  method  improve  as  more  frames  are  added,  since  the  registration  accuracy  is 
constant,  (see  Figures  4.2  and  4.3).  However,  the  scene  estimate  image  quality  for 
the  single  frame  method  still  improves  with  more  frames  because  more  images  are 
used  in  averaging,  which  decreases  the  effects  of  the  additive  noise.  Also,  averaging 
reduces  the  effect  of  a  registration  error,  therefore,  even  without  noise  the  single  frame 
method  generates  an  improved  scene  estimate  as  more  frames  are  added. 

The  results  from  the  experiments  clearly  show  the  benefits  of  a  multi-frame 
registration  technique.  The  primary  disadvantage  is  that  the  multi-frame  technique 
requires  considerably  more  computational  time  due  to  the  increased  number  of  esti¬ 
mations  required,  as  in  Figure  4.13,  which  is  especially  true  for  large  data  sets.  If  a 
real-time  application  is  required  this  method  might  prove  too  costly,  but  with  current 
and  future  computer  performance  advances  this  disadvantage  is  less  of  a  factor. 
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(a)  True  Image 


(b)  Sample  Noisy  Frame 


(c)  Multi-Frame  Estimate 


(d)  Single  Reference  Frame  Estimate 


Figure  4.1:  Scene  estimates  of  the  multi-frame  and  single  frame  registration  meth¬ 
ods  from  the  cross-correlation  experiment.  Scene  estimates  are  made  using  24  frames 
of  data.  Also  shown  are  the  true  image  and  a  sample  frame  from  the  noisy  data. 
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RMSE  of  Vertical  Translation  RMSE  of  Horizontal  Translation 


Figure  4.2:  Registration  error  of  translation  estimates  (in  pixels)  from  the  cross¬ 

correlation  experiment  using  the  aerial  image  of  a  crop  field. 
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Figure  4.3:  Quality  of  the  scene  estimates  from  the  cross-correlation  experiment 

using  the  aerial  image  of  the  crop  field.  The  SNR  is  computed  using  a  linear  scale. 
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(a)  True  Image 


(b)  Sample  Noisy  Frame 


(c)  Multi-Frame  Estimate  (d)  Single  Reference  Frame  Estimate 

Figure  4.4:  Scene  estimates  of  the  multi-frame  and  single  frame  registration  meth¬ 
ods  from  the  RT  experiment.  Scene  estimates  are  made  using  24  frames  of  data.  Also 
shown  are  the  true  image  and  a  sample  frame  from  the  noisy  data. 
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RMSE  for  Vertical  Translation  RMSE  for  Horizontal  Translation 


Figure  4.5:  Registration  error  of  translation  estimates  (in  pixels)  from  the  RT 

experiment  using  the  aerial  image  of  a  crop  field. 


54 


Figure  4.6:  Registration  error  of  rotation  estimates  (in  degrees)  from  the  RT  ex¬ 

periment  using  the  aerial  image  of  a  crop  field. 


Figure  4.7:  Quality  of  the  scene  estimates  from  the  RT  experiment  using  the  aerial 
image  of  a  crop  held.  The  SNR  is  computed  using  a  linear  scale. 
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(a)  True  Image 


(b)  Sample  Noisy  Frame 


(c)  Multi-Frame  Estimate  (d)  Single  Reference  Frame  Estimate 

Figure  4.8:  Scene  estimates  of  the  multi-frame  and  single  frame  registration  meth¬ 
ods  from  the  FMT  experiment.  Scene  estimates  are  made  using  24  frames  of  data. 
Also  shown  are  the  true  image  and  a  sample  frame  from  the  noisy  data.  The  sample 
noisy  frame  visually  appears  to  be  higher  quality  than  the  scene  estimates.  However, 
the  SNR  of  the  individual  noisy  frame  is  approximately  20,  which  is  significantly  lower 
than  the  SNR  for  the  single  and  multi-frame  scene  estimates  of  approximately  40.5 
and  33.5,  respectively.  Also,  the  SNR  of  the  sample  noisy  frame  corresponds  to  the 
trend  shown  in  Figure  4.12. 
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RMSE  for  Vertical  Translation  RMSE  for  Horizontal  Translation 


Figure  4.9:  Registration  error  of  translation  estimates  (in  pixels)  from  the  FMT 

experiment  using  the  aerial  image  of  a  crop  field. 
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Figure  4.10:  Registration  error  of  rotation  estimates  (in  degrees)  from  the  FMT 

experiment  using  the  aerial  image  of  a  crop  held. 


Number  of  Frames 

Figure  4.11:  Registration  error  of  scale  estimates  from  the  FMT  experiment  using 
the  aerial  image  of  a  crop  held.  Since  scale  estimates  are  an  estimate  of  the  relative 
size  ratio,  the  lower  limit  for  scale  estimation  is  1. 
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Figure  4.12:  Quality  of  the  scene  estimates  from  the  FMT  experiment  using  the 

aerial  image  of  the  crop  field.  The  SNR  is  computed  using  a  linear  scale.  The  curves 
flatten  because  the  registration  error  outweighs  the  effect  of  the  image  averaging, 
especially  since  low  noise  is  added  to  the  images. 
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Figure  4.13:  Comparison  of  the  number  of  estimations  required  for  both  registration 
methods,  emphasizing  the  main  disadvantage  of  the  multi-frame  method.  Consider¬ 
ably  more  estimations  are  required  for  large  data  sets. 
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V.  Conclusions 


Image  distortions  such  as  translation,  scale,  and  rotation  potentially  obstruct  the 
ability  to  gather  useful  information  from  a  set  of  images.  Three  experiments  are  de¬ 
veloped  to  simulate  systems  that  introduce  different  combinations  of  distortion  and 
noise.  The  first  uses  2-D  cross  correlation  to  estimate  translation.  The  second  exper¬ 
iment  employs  a  new  registration  technique  that  uses  the  RT  to  estimate  translation 
and  rotation.  The  final  experiment  uses  the  FMT  and  phase  correlation  to  estimate 
translation,  rotation,  and  scale.  In  each  experiment,  single  and  multi-frame  image 
registration  are  used  to  estimate  distortions  and  perform  frame  averaging.  A  dis¬ 
tinct  and  noticeable  improvement  in  both  registration  accuracy  and  quality  of  frame 
averaging  is  observed  and  quantified.  The  new  multi-frame  registration  method  is 
shown  to  be  reliable  even  for  the  registration  algorithms  that  perform  poorly.  Also, 
the  unbiased  estimation  and  utilization  all  available  data  for  the  multi-frame  method 
proved  it  to  be  superior  over  traditional  registration. 

The  results  provide  further  justification  that  the  new  method  is  superior  and 
maintains  its  superiority  through  a  wide  range  of  applications.  It  is  shown  that  the 
multi-frame  method  displays  increased  performance  with  increased  number  of  images 
but  with  decreased  processing  speed;  thus  it  is  ideal  for  post  processing  applications. 
Multi-frame  registration  could  still  be  used  in  pseudo-real-time  applications,  because 
a  window  of  data  could  be  processed  to  exercise  the  benefits  of  the  new  method. 
Furthermore,  it  is  shown  that  the  new  registration  technique  is  equivalent  to  the 
optimal  Gauss-Markov  estimator  for  relative  distortion  in  images. 

5.1  Future  Research 

Future  research  on  multi-reference  frame  registration  is  needed  to  realize  its  full 
potential.  Research  in  the  area  of  the  real-time  applications  using  measured  (non- 
simulated)  data  is  needed.  Multi-frame  registration  could  be  applied  to  registration 
algorithms  that  handle  more  complex  distortions,  such  as  images  from  different  view¬ 
points  and/or  multiple  sensors.  For  example,  image  fusion  combines  images  from 
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different  viewpoints  and/or  sensors  to  create  a  larger  2-D  image  or  3-D  representa¬ 
tion  of  the  scene,  and  misregistration  results  in  poor  construction  of  the  composite 
image.  Multi-frame  registration  could  improve  this  process.  The  multi-frame  method 
could  also  be  applied  to  feature  detection  or  feature  matching  algorithms.  These  ap¬ 
plications  sometimes  require  accurate  detection  of  multiple  features  and  are  sensitive 
to  image  degradations  and  different  imaging  conditions.  Another  area  that  may  prove 
fruitful  is  analyzing  systems  that  have  unusual  noise  patterns  or  corruption.  These 
systems  could  benefit  greatly  from  the  unbiased  estimation  of  the  multi-frame  method. 
It  may  be  interesting  to  investigate  systems  with  non-uniform  distributions  of  distor¬ 
tion.  In  particular,  it  may  be  possible  to  account  for  knowledge  of  the  distribution 
and  improve  registration  performance. 
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Appendix  A.  Calculations  for  Optimal  Gauss- Markov  Estimation 


The  new  multi-frame  estimation  method  is  an  optimal  Gauss-Markov  estimator 
for  N— 1  relative  distortion  in  N  images  given  the  (N2—N)/2  pairwise  distortion 
estimates;  see  section  2.2.1.  Here  calculations  are  performed  for  a  specific  case,  and 
the  results  for  this  case  can  be  generalized  to  any  N. 

For  N  =  4,  the  vector  of  true  distortions  with  respect  of  the  first  image  is 

D{  1,  k}  =  [fi(l,  2},  fi(l,  3},  fi(l,  4}]T,  (A.l) 

and  the  vector  of  ( IV 2  —  N)/ 2  distortion  estimates  for  all  N  images  taken  two  at  a 
time  is 


Avf  I)',.?}  =  [G{1,2},G{1,3},H{1,4},G{2,3},G{2,4},G{3,4}]t  (A.2) 


The  matrices  A  and  Pe  are 


A 


5(2,  2}  -5(1,2} 
5(3, 2} -5(1, 2} 
A{4, 2}  -  A{1,2} 
A{3,2}  -  A{2,2} 
5(4,2} -5(2,2} 
5(4, 2} -5(3, 2} 
1  0  0 
0  1  0 
0  0  1 
-1  -1  0 
-1  0  1 
0  -1  1 


5(2, 3} -5(1, 3} 
5(3, 3} -5(1, 3} 
5(4, 3} -5(1, 3} 
5(3, 3} -5(2, 3} 
5(4, 3} -5(2, 3} 
5(4, 3} -5(3, 3} 


5(2, 4} -5(1, 4} 
5(3, 4} -5(1, 4} 
5(4, 4} -5(1, 4} 
5(3, 4} -5(2, 4} 
5(4, 4} -5(2, 4} 
5(4, 4} -5(3, 4} 


(A.3) 
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and 
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0 
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0 
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0 
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0 

0 

0 

0 

0 

0 
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The  optimal  Gauss-Markov  estimator  for 

D 

is 

(A.4) 


D  =[AtP^A\~1AtDm 

§6(1,  2}  +  1,  3}  +  4}  -  ±6(2,  3}  -  4-0(2,  4}  +  00(3,  4} 

=  |6{1,  2}  +  §6(  1,  3}  +  4}  +  |6{2,  3}  -  0^(2,  4}  -  |0{3,  4}  . 

|6{1,  2}  +  ±6(1,  3}  +  §6(  1, 4}  +  00(2,  3}  -  f  6(2, 4}  +  |0{3, 4} 

(A.5) 

The  same  result  is  achieved  using  the  multi-frame  estimations.  The  matrix  of 
N2  distortions  estimates  for  all  N  images  taken  two  at  a  time  is 


0 

6(1,2} 

6(1,3} 

6(1,4} 

6(1,2} 

0 

6(2,3} 

6(2,4} 

6(1,3} 

-6(2,3} 

0 

6(3,4} 

6(1,4} 

-6(2,4} 

-6(3,4} 

0 

The  multi-frame  estimates  are  calculated  by  averaging  along  the  rows,  as  in  equa¬ 
tion  (2.27), 

in{i,2}  +  !n{i,3}  + in{i,4} 
a  -£6(1,2} +  16(2,3} +  16(2,4} 

'  'm  =  „  „  „  ■  \A-‘> 

-|n{l,3}-4n{2,3}+|Q{3,4} 

-16(1,4} -16(2,4} -16(4,4} 
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The  N  —  1  distortion  estimates  may  now  be  computed  relative  to  the  first  image  by 
subtracting  the  first  image  estimates,  from  all  other  multi-frame  estimates: 


D 


—  ^m(I) 


&m(  2) 
&m f(3) 
&m(  4) 


§6{1,  2}  +  1^(1,  3}  +  Jn{l,  4}  -  ln{2,  3}  -  |0{2, 4}  +  00(3,  4} 
I«{1,  2}  +  ^{1,  3}  +  |n{l,  4}  +  \Cl{2,  3}  -  0fl{2, 4}  -  ±fi{3, 4} 
2}  +  3}  +  ^{1, 4}  +  0O{2,  3}  -  \Cl{ 2, 4}  +  |0{3, 4} 
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